Optimal density estimation in data containing clusters of unknown structure

نویسنده

  • Alfred Ultsch
چکیده

A method for measuring the density of data sets that contain an unknown number of clusters of unknown sizes is proposed. This method, called Pareto Density Estimation (PDE), uses hyper spheres to estimate data density. The radius of the hyper spheres is derived from information optimal sets. PDE leads to a tool for the visualization of probability density distributions of variables (PDEplot). For Gaussian mixture data this is an optimal empirical density estimation. A new kind of visualization of the density structure of high dimensional data set, the PMatrix is defined. The P-Matrix for a 79dimensional data set from DNA array analysis is shown. The P-Matrix reveals local concentrations of data points representing similar gene expressions. The P-Matrix is also a very effective tool in the detection of clusters and outliers in unknown data sets.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Density Estimation and Visualization for Data Containing Clusters of Unknown Structure

A method for measuring the density of data sets that contain an unknown number of clusters of unknown sizes is proposed. This method, called Pareto Density Estimation (PDE), uses hyper spheres to estimate data density. The radius of the hyper spheres is derived from information optimal sets. PDE leads to a tool for the visualization of probability density distributions of variables (PDEplot). F...

متن کامل

Pareto Density Estimation: A Density Estimation for Knowledge Discovery

Pareto Density Estimation (PDE) as defined in this work is a method for the estimation of probability density functions using hyperspheres. The radius of the hyperspheres is derived from optimizing information while minimizing set size. It is shown, that PDE is a very good estimate for data containing clusters of Gaussian structure. The behavior of the method is demonstrated with respect to clu...

متن کامل

Estimation of geochemical elements using a hybrid neural network-Gustafson-Kessel algorithm

Bearing in mind that lack of data is a common problem in the study of porphyry copper mining exploration, our goal was set to identify the hidden patterns within the data and to extend the information to the data-less areas. To do this, the combination of pattern recognition techniques has been used. In this work, multi-layer neural network was used to estimate the concentration of geochemical ...

متن کامل

خوشه‌بندی خودکار داده‌ها با بهره‌گیری از الگوریتم رقابت استعماری بهبودیافته

Imperialist Competitive Algorithm (ICA) is considered as a prime meta-heuristic algorithm to find the general optimal solution in optimization problems. This paper presents a use of ICA for automatic clustering of huge unlabeled data sets. By using proper structure for each of the chromosomes and the ICA, at run time, the suggested method (ACICA) finds the optimum number of clusters while optim...

متن کامل

Syllable structure in Old, Middle and Modern Persian: A contrastive analysis

Evolution of languages has always been of interest to linguists.  In this paper we study  the natural progress of the syllable structure from Old  Persian  (O.P)  to Middle Persian (Mi.P) and up to the Modern Persian (Mo.P). For this purpose all the words containing consonant sequences are collected from specific sources of each  of these  languages,  and then  analysed  according to the syllab...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003